A Linear-Time Burrows-Wheeler Transform Using Induced Sorting
نویسندگان
چکیده
To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [6, 13], one constructs BWT incrementally, which requires O(n logn) time where n is the length of the input text. We present an algorithm for computing BWT directly in linear time by modifying the suffix array construction algorithm based on induced sorting [16]. We show that the working space is O(n log σ log logσ n) for any σ where σ is the alphabet size, which is the smallest among the known linear time algorithms.
منابع مشابه
A Bijective String Sorting Transform
Given a string of characters, the Burrows-Wheeler Transform rearranges the characters in it so as to produce another string of the same length which is more amenable to compression techniques such as move to front, run-length encoding, and entropy encoders. We present a variant of the transform which gives rise to similar or better compression value, but, unlike the original, the transform we p...
متن کاملRadixZip: Linear-Time Compression of Token Streams
RadixZip is a block compression technique for token streams. It introduces RadixZip Transform, a linear time algorithm that rearranges bytes using a technique inspired by radix sorting. For appropriate data, RadixZip Transform is analogous to the Burrows-Wheeler Transform used in bzip2, but is both simpler in operation and more effective in compression. In addition, RadixZip Transform can take ...
متن کاملA Text Transformation Scheme for Degenerate Strings
The Burrows-Wheeler Transformation computes a permutation of a string of letters over an alphabet, and is well-suited to compression-related applications due to its invertability and data clustering properties. For space e ciency the input to the transform can be preprocessed into Lyndon factors. We consider scenarios with uncertainty regarding the data: a position in an indeterminate or degene...
متن کاملOutput distribution of the Burrows - Wheeler transform ' Karthik
The Burrows-Wheeler transform is a block-sorting algorithm which has been shown empirically to be useful in compressing text data. In this paper we study the output distribution of the transform for i.i.d. sources, tree sources and stationary ergodic sources. We can also give analytic bounds on the performance of some universal compression schemes which use the Burrows-Wheeler transform.
متن کاملTwo Space Saving Tricks for Linear Time LCP Array Computation
In this paper we consider the linear time algorithm of Kasai et al. [6] for the computation of the Longest Common Prefix (LCP) array given the text and the suffix array. We show that this algorithm can be implemented without any auxiliary array in addition to the ones required for the input (the text and the suffix array) and the output (the LCP array). Thus, for a text of length n, we reduce t...
متن کامل